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Abstract 

We propose here some new sampling algorithms for Path Sampling in the case when 
stochastic dynamics are used. In particular, we present a new proposal function for equilibrium 
sampling of paths with a Monte-Carlo dynamics (the so-called "brownian tube" proposal). 
This proposal is based on the continuity of the dynamics with respect to the random forcing, 
and generalizes all previous approaches when stochastic dynamics are used. The efficiency of 
this proposal is demonstrated using some measure of decorrelation in path space. We also 
discuss a switching strategy that allows to transform ensemble of paths at a finite rate while 
remaining at equilibrium, in contrast with the usual Jarzynski like switching. This switching is 
very interesting to sample constrained paths starting from unconstrained paths, or to perform 
simulated annealing in a rigorous way. 

1 Introduction 

The behavior of many systems in the fields of physics, chemistry and biology, is dictated by rare 
but important transitions between metastable states. Usually, only some local exploration of the 
metastable sets can be performed, and it is very difficult to study the transitions by resorting to 
straightforward simulations - using for example molecular dynamics or kinetic Monte-Carlo. The 
Transition Path Sampling (TPS) formalism, first proposed in [23| and further developped in [11] 
(see also (5J [13] for extensive reviews), is a strategy to sample only those paths that lead to a 
transition. It also gives some information on the transition kinetics, such as the rate constant as 
a function of time or the activation energies [10J. Recent practical and theoretical developments 
(such as Transition Interface Sampling [3H [33]) are still aiming at increasing the power of the 
method. State of the art applications of path sampling, such as [3], now involve as much as 15, 000 
atoms with paths as long as 10 ns. 

Recently, relying on the Jarzynski formula fTS] [19] (roughly speaking, an exponential average 
over the works performed during the switching from an initial to a final state), path sampling 
techniques have also been used to compute free energy differences more efficiently [30], [36l [22] by 
precisely enhancing the paths that have the larger weights (which correspond to the unlikely lower 
work values). 

Many path sampling studies (especially TPS studies) have used deterministic dynamics (Path 
sampling in the NVE ensemble has already been thoroughly studied, see [13] for a review). How- 
ever, path sampling with stochastic dynamics is of great interest for nonequilibrium simulations [9]. 
Besides, some models are stochastic by nature (see e.g [I] where the authors consider a model sys- 
tem of protein pulling in implicit solvent, and a chemical reaction simulated with kinetic Monte 
Carlo). Finally, we believe that there is room for improvement in the path sampling techniques 
for stochastic dynamics. We therefore restrict ourselves to the stochastic setting in this study. 
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To this date, the usual equilibrium sampling of paths with stochastic dynamics is done either 
with the usual shooting dynamics inspired from the corresponding algorithm for deterministic 
paths [13]; or with the so-called "noise history" algorithm introduced in [9], which relies on the 
description of paths as a starting point and the sequence of random numbers used to generate the 
trajectory. It is one of our aims here to relate both strategies and generalize them by introducing 
a new way to propose paths: namely by generating random numbers correlated with the ones 
used to generate the previous path. When the correlation is zero, the usual shooting dynamics is 
recovered. When the correlation is one everywhere except for some index along the path where it is 
zero, the noise-history algorithm is recovered. This generalization may be useful for example when 
the dynamics are too diffusive (Langevin dynamics in the high friction limit) since the shooting 
dynamics are inefficient in this limit; or to enhance the decorrelation of the paths generated using 
the noise history algorithm. 

We also consider nonequilibrium sampling of paths, using some switching dynamics on paths |15j . 
inspired from the now well-known Jarzynski out-of-equilibrium switching in phase-space [!"""[ [19] . 
This switching can be performed whatever the underlying dynamics on paths. It can be used to 
transform a sample of unconstrained paths to reactive paths (ending up in some given region). 
This approach was already followed in [15], and allows to compute rate constants. However, the 
final sample of paths is very degenerate, and cannot be used as a reliable equilibrium sample of 
reactive paths. In the same vein, one could imagine doing simulated annealing on paths (simulated 
tempering on paths has already been investigated in [31]), in order to obtain typical transition 
paths at temperatures where direct sampling is not feasible. However, unless the annealing process 
is very slow, the final sample is usually not correctly distributed. We therefore also present the 
application to path sampling of a birth/death process, the so-called "Interacting Particle System" 
(IPS), already used in [26] in the field of molecular dynamics to compute regular phase space prop- 
erties. This methodology is widely used in the fields of Quantum Monte Carlo [U [25] or Bayesian 
Statistics, where it is referred to as Sequential Monte Carlo [H]. It allows, through some selection 
of the paths during the nonequilibrium switching at a finite rate, to precisely reequilibrate the 
paths distribution at all times. Such a reequilibration is of paramount importance for the end 
sample to be distributed according to the canonical measure on paths. Besides, since the sample 
of paths follows the canonical distribution at all times, the properties of interest can be computed 
in a single simulation for a whole range of values. For example, the rate constant could be obtained 
for a whole range of temperatures, which allows to compute the activation energy following the 
method presented in [TO] . 

The paper is organized as follows. We first present the path ensemble in section and turn 
to equilibrium sampling of paths in section [3[ We introduce in particular in section 13.31 the 
"brownian tube" proposal function which generalizes the previous algorithms for path sampling 
with stochastic dynamics, and compare this new proposal functions to the previous ones using some 
two-level sampling indicators (for the local sampling, see section [3741 where an abstract measure 
of diffusion in path space is introduced) . Finally, we present in section [4] the switching dynamics 
on paths, with the IPS extension enabling a reequilibration of the paths distribution at all times, 
even when the switching is done at a finite rate (see section [4*72)1 . 

2 The path ensemble with stochastic dynamics 
2.1 The canonical measure on discretized paths 

We consider a system of N particles, with mass matrix M = Diag(?ni, . . . , m^v), described by a 
configuration variable q = (qi, . . . , gjv), and a momentum variable p = (pi, . . . ,Pn)- The dimension 
of the space is denoted by d, so that qi,Pi £ R d for all 1 < i < N, We consider stochastic dynamics 
of the form 

dX t = b(X t ) dt + EdW t , (1) 

where the variable X t represents either the configurational part qt , or the full phase space variables 
(q t ,p t ). The function b is the force field, the matrix £ is the magnitude of the random forcing, 
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and W t is a standard Brownian motion (the dimension of W t depending on the dynamics used) . 

We restrict ourselves in this study to the most famous stochastic dynamics used in practice, 
namely the Langevin dynamics 

dq t =M~ 1 p t dt, , , 

dp t = -VV(q t ) dt - jM^pt dt + a dW t , [ ' 

where W t denotes a standard <i/V-dimensional Brownian motion, and with the fluctuation-dissipation 
relation a 2 = 2~f/(3. In this case, the variable x = {q,p) describes the system and the energy is 
given by the Hamiltonian E{x) = H(q,p) = V(q) + ^p T M -1 p. Some studies (see e.g. [36J) however 
resort to the overdamped Langevin dynamics 

dq t = -VV(q t )dt + ^dW t , 

in which case x = q and E(x) = V(q). The ideas presented in the sequel can of course be 
straightforwardly extended to this case. 

In practice, the dynamics have to be discretized. Considering a time step At and a trajectory 
length T = LAt, a discrete trajectory is then defined through the sequence 

x = (x , . . . ,x L ). 

Its weight is 

L-1 

n( x ) = z l 1 p( x o) p(Xi,X i+1 ), (3) 



where p(xo) = Z^ 1 c~ l3E ^ x °" > is the Boltzmann weight of the initial configuration, p(xi,Xi + i) is 
the probability that the system is in the state 2^+1 conditionally that it starts from Xi, and Zl 
is a normalization constant. This conditional probability depends on the discretization of the 
dynamics used. 

Denoting by l\(x), 1b (x) the indicator functions of some sets A, B defining respectively the 
initial and the final states, the probability of a given reactive path between the sets A and B is 
then 

L-1 

kab{x) = Z^ x B l A {xo)p(xo) JJ p(xi,x i+1 )l-B(x L )- (4) 

i=0 

Transition Path Sampling [TTJ [13] aims at sampling the measur^l ttab , using in particular Monte- 
Carlo moves of Metropolis-Hastings type. 

2.2 Discretization of the dynamics 

We present here a possible discretization of the Langevin dynamics, and the corresponding tran- 
sition probability p(xi, Xi+i). This discretization, called "Langevin Impulse" [27], relies on an 
operator splitting technique, and is more appealing from a theoretical viewpoint than previous 
discretizations (such as the BBK algorithm [6], or schemes proposed in [2]). For particles of equal 
masses (up to a rescaling of time, M = Id; the extension to the general case is straightforward), 
the numerical scheme we use here reads [27] : 

Pi+1/2 = Pi ~ -yVVfe), 

qi+i = Qi + Cip i+1 / 2 + Ui,i, (5) 

At , . 
p i+ i = c p i+1/2 - — W{q l+ i) + U 2 ,i, 



1 Notice that the measure ttab = ^ab* depends in fact explicitely on the length of the paths, and of the time 
steps used in practice. 
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with 



c = cxp(-7At), ci 



1 — exp(— "/At) 



The centered gaussian random variables (U\,i, Ui.i) with Uus — {u\ A , . . . , uf^) are such that 



with 



At 



2 - 



3 _ 4e - 7 At + e -2 7 At\ 



-/At 



J, -I = ~p (1 " e^ At ) , c 12 a lCT2 = ^(l-e-^)' 



In practice, the random vectors {U\i,Uii) are computed from standard gaussian random numbers 
(G M ,G 2 , ?; ) with G M = (ffi A: ' 



°1 flli) U 2,i = °"2 I \/ 1 - cf 2 ffL + Ci2 3i, 



(6) 



We will always denote by G standard gaussian random vectors in the sequel, whereas the notation 
U refers to non-standard gaussian random vectors. 
Denoting by 



di = di((q i+ i,p i+ i), (qi,Pi)) = 



d 2 ((ft+i,P 4 +i), (Qi,Pi)) 



At . . 

qi+i - q% - ci Pt + ci — W{q.i) 



At 



Pi+i - coPi + — (c W(ft) + V(q i+1 )) 



the conditional probability p((<7i+i,Pt+i), (<?i,Pi)) to be in the state Xj+i = (qi+ijPi+i) starting 
from Xi = (qi,Pi) reads 



p(x i+ i,Xi) = Z 1 exp 



■2o^m(£) + (l) - 2ci2 (^)(l 



(7) 



where the normalization constant is Z = I 27r<7i<7 2 \/l c i 



3 Equilibrium sampling of the path ensemble 

The most popular way to sample paths is to resort to a Metropolis-Hastings scheme |20[ ITS] . 
Other approaches may be considered in some cases , see [13] for a review of alternative approaches. 
Those approaches however require some force evaluation (see e.g. [UJ for a Langevin dynamics 
in phase space in the case of a toy two-dimensional problem) . But the force exerted on a path is 
proportional to V(hi7r), and is difficult to compute in general since it requires the evaluation of 
second derivatives of the potential in conventional phase space. 

We first recall the general structure of the Metropolis-Hastings algorithm, and precise some 
of its specifities, especially when sampling reactive paths. We then recall a usual technique to 
propose paths in section l3~2l and generalize it in section l3~3l We finally propose some benchmarks 
to compare the efficiencies of all these proposal functions. 



3.1 Metropolis-Hastings sampling techniques for path sampling 

For a probability measure ir on the discretized path ensemble (such as (|3|) or (J4j) ) , a Metropolis- 
Hastings scheme is defined as a Markov chain with transition probability kernel 

P(x, dy) = r(x, y)V(x, y) dy+fl- f r(x, y')V(x, y') dy') 8 X , (8) 
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where the density r(x, •) is given by 



r(x, y) = min I 1, 



n{y)V(y,x) 

Tr(x)V(x,y) 



) 



(9) 



The function V is the proposal function (It is more commonly called 'generation probability' in 
the field of molecular simulation). In words, the path y is proposed with probability V(x,y) from 
x, and accepted with probability r(x,y), rejected otherwise. 

The measure n is by construction an invariant measure of the corresponding Markov chain, 
and P is the transition kernel. For all n > 0, P n (x, A) is the probability to reach the set A in n 
steps starting from x (recall that P n (x, •) is a probability measure for all n > 0). If for all x, y, 
there exists n > 1 such that P n {x,y) > 0, and the chain is aperiodic, then the Markov chain is 
ergodic [21] (We refer to [7] for an introduction to ergodicity issues for sampling schemes in the 
field of Molecular Dynamics). 

The key point in all Metropolis-Hastings schemes is to find an efficient proposal function. In 
particular, there is always a trade-off between the acceptance and the decorrelation rate of the 
Markov chain. Indeed, if the acceptance rate is low, the obtained sample is degenerate, and 
not statistically confident. On the other hand, to increase the acceptance rate, more correlated 
iterations can be used. In this case the method is more likely to remain trapped in local minima, 
and the numerical ergodicity rate may be slow. In many situations, the optimal acceptance rate 
is around 1/2. This heuristic rule can be made rigorous in some situations (see e.g. [2l] where the 
optimal acceptance rate is shown to be 0.574 for a specific Metropolis-Hastings scheme based on a 
Euler-Maruyama proposition, in the limit when the dimension of the phase-space goes to infinity). 

In the case of reactive paths, a study of the acceptance rate asks to decompose the accep- 
tance/rejection procedure in two successive steps: (i) the proposition of a path starting from A 
and going to B; (ii) the acceptance or rejection of such a path according to the Metropolis-Hastings 
scheme. The difficult step is the first one, since paths bridging A and B are only a (small) sub- 
set of the whole path space. In particular, diffusive dynamics such as the overdamped Langevin 
dynamics are often not convenient to propose bridging paths; the situtation is however better for 
dynamics with some inertia, such as the Langevin dynamics. When the paths are constructed 
using deterministic dynamics (NVE case), some studies have shown that the optimal acceptance 
rate is about 40 % for the cases under consideration [T3] , 

For path sampling with stochastic dynamics, the "shooting" proposal function is classically 
used [13]. However, even for moderate values of the friction coefficient 7 in the Langevin dynamics, 
this proposal function may have low acceptance rates, especially if the dimension of the system is 
high or/and the barriers to cross are large. An alternative way of proposing paths, relying on the 
so-called "noise history" of the paths [9] {i.e. the sequence of random numbers used to generate 
the trajectory from a given starting point) is to change only one of the random numbers used and 
to keep the others. In this case, a high acceptance rate is expected, but the paths generated may 
be very correlated. 

A natural generalization of both approaches is to rely on the continuity of the dynamics with 
respect to the random noise forcing, and to propose a new trajectory by generating new random 
numbers correlated with the previous one. We call this approach the "brownian tube" proposal. 
In this case, an arbitrary acceptance rate can be reached, and there is room for optimizing the 
parameters in order to really tune the efficiency of the sampling. 

3.2 The shooting proposal function 

The shooting technique described in [i~3l section 3.1.5] consists in the three following steps, starting 
from a path x n : 

• select an index < i < L according to discrete probabilities (wi)o<i<i (for example a 
uniform probability distribution can be considered, unless one wants to increase trial moves 
starting from certain regions, for example the assumed transition region); 
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• generate a new path (i/j+i, ■ ■ ■ , ijl) forward in time, using the stochastic dynamics ((2|), with 
a new set of independently and identically distributed (i.i.d.) gaussian random numbers 

(Uj 1+1 ) 1+ i<j<l-i; 

• generate a new path (j/i— 1, ■ ■ ■ , yo) backward in time, using a discretized "backward" stochas- 

tic dynamics correponding to ([2]), with a new set of i.i.d. gaussian random numbers (Uj )o<j<i-i, 

• set x n+1 = y with probability r(x n , y), otherwise set x n+1 = x n . 

The "backward" part of the trajectory can be computed using some backward integration 
(resorting to negative time steps), but the associated schemes are often unstable [28]. Therefore, a 
more appropriate method is to resort to time reversal: The forward dynamics are used to generate 
the points yi from yj+i in a time-reversed manner. This means that variables odd with respect 
to time reversal (such as momenta) are inverted, and variables even with respect to time reversal 
(such as positions) are kept constant. Denoting by S the reversal operator, Syi = (<&, —pi) when 
Vi = (lii Pi) f° r Langevin dynamics. The usual one-step integrator $a* is then considered to 
integrate the corresponding trajectory: 

yi = (So $ At o% +1 . 

The time-reversed conditional probability pxR(j/i+i, yi) to go from yi+i to yi is 

P(y»+i,3/i) = pTR(2/i+i,Z/i) = p(Syi+i,Syi). 

We will always denote in the sequel the random numbers used in this process by U. The probability 
of generating a path y = (yo, . . . , y£) from x, shooting forward and backward from the z-th index, 
is then 

i-1 L 

7 > {x,y) = Y[p(yj+i,yj) II p(%-1'%)- ( 10 ) 

3=0 j=i+l 

Notice that the previous path x is present only through the term = Xi. It then follows 

r(x, y) = min (1, lA(yo)lB(yL)c cxa ct(£, y)) , 

with 

(„ ,a - P(V°) TT Pfe^J+i) P(xj+i,Xj) nr , 
P(xo) fJ- P(Vj+l,Vj) p(Xj,X j+1 ) 

It is clear that, for reasonable discretizations, P 2 (x, y) > for all paths x, y of positive probabil- 
ity (under mild assumptions on the potential) so that the correponding Markov chain is irreducible. 
Since the measure ([4]) is left invariant by the dynamics (this is a classical property of Metropolis- 
Hastings scheme), the corresponding Markov chain is ergodic [21]. Notice also that it is enough to 
consider only the forward or the backward integration steps for the ergodicity to hold, as long as 
both have a positive probability to occur (and that the possible asymmetry in the corresponding 
probabilities is accounted for). 

In some cases, the microscopic reversibility ratio 

o / \ p(yi)v(yi,Vi+i) 

Rrev(yi,Vi+\) = —, T-T7 r 

is close to 1, so that c oxac t(a;,y) ~ 1 and the acceptance/rejection step is greatly simplified. 
However, this assumption should always be checked carefully using some preliminary runs since it 
is sometimes the case that, even if the reversibility ratio R lcv is close to 1 pointwise (with a good 
approximation), it may be false that c exstc t(x,y) — 1 along the path, especially if the paths are 
long (see [28j for a more systematic study of this point). 
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3.3 The brownian tube proposal function 

A path can also be characterized uniquely by the initial point xq and the realization of the brownian 
process W% in ([!]). When discretized, the paths are then uniquely determined by the sequence of 
standard gaussian random vectors U = {Uq, . . . , Ul-i) used to generate the trajectories using §5§ 
(or any discretization of another SDE). This was already noted in [9j, where a new trajectory was 
proposed selecting an index at random and changing only the gaussian random number associated 
with this index. 

Since the trajectory is continuous with respect to the realizations of the brownian motion, any 
convenient small perturbation of the sequence of random numbers is expected to generate a path 
close to the initial path. Still denoting by p(xi,Xi + i) the probability to generate a point Xi + \ in 
phase-space starting from Xi, using the gaussian random vectors Ui and Ui obtained from standard 
gaussian random vectors Gi and Gi, the transition probabilities for all classical discretizations we 
consider can be writtten as 

p(xi,x l+1 ) = Z^cxp ^-^GfTG^j , p T R{xi+i,Xi) = Z _1 exp (-^GfTGi 

where Z is a normalization constant. In the case of the discretization ((5J of the Langevin equation 
for example, T = V T V where the matrix V allows to recast the correlated gaussian random 
vectors Ui = (U\^,U2,i) (or Ui) as standard and independent gaussian random numbers Gi (or 
Gi) through the transformation Ui = VGi (or Ui = VGi) with (see Eq. (7])) 

/ (Ti 1 ld dN 

V = gig T j 1 T 1 

The idea is then to modify the standard gaussian vectors Gi by an amount < on < 1 as 



Gi = a i G i + yJl-o$R i (12) 

where Ri is a 2cW-dimensional standard gaussian random vector. A fraction ctj is associated with 
each configuration Xi along the path. The usual shooting dynamics is recovered with on = for 
all i (all the Brownian increments are uncorrelated with respect to the Brownian increments of 
the modified path), whereas the so-called 'noise history' algorithm proposed in |9| corresponds to 
on = for all i but one io for which cti = 1 (in this case, all the Brownian increments but one are 
re-used). 

The dynamics we propose looks like the shooting dynamics: first, a position < k < L along 
the path is chosen at random; a coefficient oii is then associated to each configuration along the 
path, and a random gaussian vector is proposed starting from the previous one using (fl2|) : finally, 
the corresponding trajectory is integrated forward from the fc-th configuration to the L-th, and 
time-reversed from the fc-th to the first, and an acceptance/rejection step is done according to ©. 

It only remains to precise the proposition function V(x, y). Denoting by (Gf )o<i<k-ii (Gf)k<i<L-i 
the standard random gaussian vectors associated with the path x (the first ones arise from the 
time reversed integration, the last ones from a usual foward integration), it follows 

V(x,y) = w k J] Pai (Gf, Gf) ft p ai (Gf,G?), 

0<i<k-l k<i<L-l 

where Wk denotes the probability to choose fc as a shooting index, and 

- / 1 \ " ( (G - aG) T (G - aG) \ 
Pa{G ' G)= { V2n(l-a>) ) CXP ( W^) 

A tuning of the coefficients ctj can then be performed in order to get the best trade-off between 
acceptance (which tends to 1 in the limit a; = 1 for all i) and decorrelation (which arises in the 
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limit on — > 0). An interesting idea could be that a has to be close to 1 in regions where the 
generating moves have a chaotic behavior (in the sense that even small perturbations to a path 
lead to large changes to this path), and could be smaller in regions where the generating moves 
have less impact on the paths (so as to increase the decorrelation) . From a more practical point 
of view, possible approaches to obtain such a trade-off are to propose a functional form for the 
coefficients on and to perform short computations to optimize the parameters with respect to 
some objective function. Some simple choices for the form of the coefficients on, involving only 
one parameter (so that the optimization is procedure is easier), are: 

• constant coefficients o>i = a; 

• set on = 1 far from the shooting index, and on close to near the shooting index. This can 
be done by considering on = mm(l, K\i — k\) for some K > 0. 

From our experience, the efficiency is robust enough with respect to the choice of the decorrelation 
coefficients a;. Notice also that the second functional form allows to recover both the usual 
shooting and the noise-history algorithm, respectively in the regimes K — > and K > 1. It is 
therefore expected that, optimizing the efficiency with respect to K g [0,1], both the shooting 
algorithm and the noise-history algorithm should be outperformed. 

3.4 Intrinsic measure of efficiency 

Our aim here is to propose some abstract measure of decorrelation between the paths, so as to 
measure some diffusion in path space. This approach complements the convergence tests based 
on some observable of interest for the system. We refer to [13] for some examples of relevant 
quantities to monitor (and applications to path sampling with deterministic dynamics), and to 
section [331 for some numerical results for stochastic dynamics. 

The intrinsic decorrelation is related to the existence of some distance or norm on path space. 
Given a distance function d(x, y), the quantity 



(with p > 1) precises the average amount of decorrelation with respect to the distance d for the 
measure ir on the path ensemble. Notice that two averages are taken: one over the initial paths 
x, and another over all the realizations of the Monte Carlo iterations starting from x {i.e. over all 
the possible end paths y, weighted by the probability to end up in y starting from x). In practice, 
assuming ergodicity, D p (n) is computed as 



Usual choices for p are p = 1 or p = 2. This last case is considered in [8] since a diffusive behavior 
over the space is expected with stochastic dynamics, the most efficient algorithms having the 
largest diffusion constants lim„^ +00 y/D^nf/n. 

It then only remains to precise the distance d, which depends on the system of interest. Some 
simple choices are to 

• consider a (weighted) norm 1 1 • 1 1 on the whole underlying phase-space (for position or posi- 
tion/momenta variables) and set 






with p' > 1 ; 
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• consider only a projection of the configurations onto some submanifold, such as the level sets 
of a given (not necessarily completely relevant) reaction coordinate or order parameter £ : 



d(x,v)=U^Ui\t(*i)-ttVi)\A 



i/p' 



with p' > 1. 



• align the paths projected onto some submanifold around a given value of the reaction coor- 
dinate £: 

W 

d(x,y) 



2K 



1 K A 

— ]T <Oi\fai+i)-Z(yj+i)\ p } > (13) 

i=-K ) 

with p' > 1, and I, J such that £(xi) = £(xj) = £* where £* is fixed in advance (for 
example, if A is characterized by £ = and £? by £ = 1, then £* could be 1/2). The integer 
7^ represents some maximal window frame so that the distance is really restricted to a region 
around the expected or assumed transition point. In the case when J — K,I — K < or 
J + K, I + K > L, the sum is accordingly restricted to less than 2K + 1 points. 

The weights a;, should be non-negative in all cases. 

A reasonable choice for non-trivial systems is for example to use (fT3| with p' = 1 and Wi = 1 . 
This approach ensures that the decorrelations arising in the initial and final basins A and B are 
discarded, and that only the decorrelation arising near the transition region are important. In 
this sense, we term this decorrelation as 'local decorrelation' since we measure how different the 
transition mechanisms are. As a measure of 'global decorrelation', we will consider the transition 
times. A numerical study based on those lines is presented in section [3~ol 



3.5 Numerical results 



Wc test the different proposal functions on a model system of conformational changes influenced 
by solvation. We consider a system composed of N particles in a periodic box of side length Iq, 
interacting through the purely repulsive WCA pair potential [T2l 129] : 



VwcaM 



4r 







e if r < ro, 
if r > r , 



where r denotes the distance between two particles, e and a are two positive parameters and 
r = 2 1 / 6 er. Among these particles, two (labeled 1 and 2 in the following) are designated to form 
a dimer while the remaining particles are solvent particles. Instead of the above WCA potential, 
the interaction potential between the particles in the dimer is a double-well potential 
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(r - r - wf 



where h and w are two positive parameters. The potential Vbw exhibits two energy minima, one 
corresponding to the compact state where the bond length of the solute dimer is r = r$ , and one 
corresponding to the stretched state where the bond length of the solute dimer is r(q) = ro + 2w. 
The energy barrier separating both states is h. Figure [T] presents a schematic view of the system. 

We consider the distance <(T3|l for reactive paths (tt = ttab in this case), using p = p' = 1 
and uJi — 1, = \q± — 92 1, C = r o + w. We use the parameters L = 500 At, /3 = 1, N = 16 
particles of masses 1, Iq = 1.3, a = 1, e = 1, w = 0.5, At = 0.0025, with the sets A = {£,(q) < 
ro + 0.6w}, B = {£(<?) > tb = + lAw} and averaging over a total of n = 5 x 10 4 Monte Carlo 
moves. We set K = 30 since the typical length of the transitions is about 60 time steps with the 
parameters used here. 
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Figure 1: Schematic views of the system, when the diatomic molecule is in the compact state 
(Left), and in the stretched state (Right). The interaction of the atoms forming the molecule is 
described by a double well potential, all the other interactions are of WCA form. 



h 


5 10 15 


Shooting 
Noise history 
Brownian tube (a, = 0.8) 


24.4 18.1 15.2 
96.7 85.7 81.2 
47.2 48.1 33.0 



Table 1: Acception rate (%) as a function of h for the three proposal functions considered. 



We also consider the correlation in the transition times. We denote by t(x) the transition 
index of some path x. Here, those indexes r are such that £(q T At) = C ■ The correlation function 
for this observable is therefore, in the case of reactive paths, 

/ (r(y) - {r)rr AB )(T{x) - (T) VAB )P n (x,dy)dTT AB (x) 

C(n) = LJ. , 

/ (t(x) - (t)x 4b ) 2 dir A B(x) 

with (t) 7TAB = J r{x)d-KAB{x) This observable is in some sense complementary to the measure of 
decorrelation in the transition zone defined above since it measures some global spatial decorrela- 
tion of the paths. In practice, assuming ergodicity, C is approximated as 



C(n) = 




Figures [2] to |4] present some plots of D(n) and C(n) for h = 5, 10, 15, for the usual shooting 
dynamics, the noise-history algorithm, and the brownian tube proposal (with ati = 0.8 for all i). 
The average acceptance rates are also presented in Table [TJ Notice that no shifting moves [l3j are 
used in order to compare the intrinsic efficiencies of the proposal functions. It is likely that these 
moves would help improving the decorrelation rate of the sampling. 

For the shooting algorithm, many paths are rejected so that the local decorrelation (measured 
by D(n)) is rather poor, especially at short algorithmic times and for high barriers (in any cases, 
lower than for the brownian tube proposal). But when a path is accepted, it is already very 
decorrelated from the previous one, so that the global decorrelation (measured by C(n)) is indeed 
decreasing rapidly enough. For the noise-history algorithm, the picture is somewhat inverted: 
since the acceptance rate is very high, even for high barriers, the local decorrelation is quite 
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Figure 2: Comparison of efficiencies for different Metropolis-Hastings proposal moves for h = 
5. Left: Plot of the correlation of the transition times C(n) (related to some global sampling 
efficiency). Right: Plot of D(n) (local sampling efficiency) for the brownian tube proposal with 
a = 0.8 (solid line), usual shooting dynamics (dashed line), and noise history (dotted line). 




10 20 30 40 50 60 70 80 90 100 5 10 15 20 25 30 

Iteration index n Iteration index n 



Figure 3: Comparison of efficiencies for different Metropolis-Hastings proposal moves for h = 10. 



efficient, but the global decorrelation is not since small local changes make it difficult to change 
the global features of the paths. The brownian tube approach tries to balance the local and global 
decorrelations. This is also reflected by a more balanced acceptance/rejection rate. 

In conclusion, the brownian tube proposal with the above correlation function is the most 
efficient sampling scheme in the case considered here. The efficiency could be further increased 
by a more systematic tuning of the parameters of the correlation factors a, , possibly depending 
on the shooting index k. In general, since the usual proposal functions are specific cases of the 
brownian tube proposal function, it is expected that there is always a parameter range such that 
this new algorithm outperforms the previous ones. 



4 (Non) equilibrium sampling of the path ensemble 

The previous section was dealing with equilibrium sampling of paths. However, when (free) energy 
barriers in path space are large, direct sampling of paths can be inefficient, since the existence 
of metastable path sets may considerably slow down the numerical convergence. It is therefore 
appealing to perform some kind of simulated annealing on paths. A regular simulated annealing 
strategy would be to first sample paths at a higher temperature, and then to cool the sample 
to the target temperature (see |31j for a simulated tempering version of such an idea). Reactive 
paths can also be otained by constraining progressively the paths to end up in B. This approach 
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0.05 




Figure 4: Comparison of efficiencies for different Metropolis-Hastings proposal moves for h = 15. 



also has the nice feature that it does not ask for an initial guess to start sampling ttab- Finally, 
a byproduct of such a switching is the ratio of partition functions in path space 

where Za, Zab are such that 

L-1 

ir A (x) = Z A {LAt)~ 1 lA(xo)p(xo) Y\_ p(xi,x i+1 ), 

i=0 

and 

L-1 

kab(x) = Z AB{LAt)~ 1 l A (x )p(x ) Y[ p(x t ,x l+1 )l B (x L ) 

i=0 

are probability measures. The function C in l[T4"f has to be computed at least once to obtain rate 
constants in practice [13]. The associated free-energy difference in path space is AFA^AB(LAt) = 
-\n(C(LAt)). 

We start this section by recalling the extension of the classical switching dynamics for nonequi- 
librium dynamics in phase space to nonequilibrium switching between path ensembles [15]. This 
method is convenient to compute free energy differences, but the final sample of paths obtained 
is very degenerate. We therefore present in section 14.21 the application to path sampling of a 
birth/death process introduced in [25l Efi], which allows to keep the sample at equilibrium at 
all times during the switching. This equilibration may be important in some cases to compute 
the right free energy values [26], and allows in any cases to end up with a non-degenerate sam- 
ple of paths and reduce the empirical variance. We will focus in the sequel on switching from 
constrained to unconstrained paths, but an extension to simulated annealing (cooling process) is 
straightforward. 



4.1 Switching between ensembles of paths 

We present in this section the approach of [15], where the switching from unconstrained to con- 
strained path ensembles is done by enforcing progressively the constraint on the end point of 
the path over a time interval [0,T]. The constraint is usually parametrized using some order 
parameter. This order parameter is the same as the one used for usual computation of reaction 
rates in the TPS framework (and even for more advanced techniques such as Transition Interface 
Sampling (TIS) |34[ I33|). The point is that this approximate order parameter needs not to be a 
"good" reaction coordinate (or a complete one) since the general path sampling approach should 
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help to get rid of some problems arising from a wrong choice of order parameter (see e.g [32] for 
a recent study on this topic). 

Assuming an order parameter is given, we can consider a switching schedule A = (A , . . . , A") 
such that A = and A™ = 1 and a family of functions h\ such that 

ho = 1, hi = 1b- 

We also introduce the family of measures associated with the functions h\ : 

L-l 

tt\{x) = Z~^ x 1 a (x )p(xq) p(xi,x i+ i)h\(x L ). (15) 

i=0 

We omit in the sequel the explicit dependence of the partition functions Z on L and At. An 
energy £\(x) can then formally be associated to a path x as 



7TA 



{x)=Zl\e- £ ^ 



The aim is to sample from iri = ttab , which is usually a difficult task, and sometimes not directly 
feasible. It may be easier to use a sample of ttq — tta (which is much easier to obtain), and to 
transform it through some switching dynamics into a weighted sample of tt\ . Starting from a path 
x k '°, the weight factor for a resulting path x k,n is of the form e~ w ' where W k,n is the work 
exerted on an unconstrained path to constrain it to end in B. We now precise the way the work 
is computed. 

Consider an unconstrained initial path a; = (xq, . . . ,x° L ) sampled according to n , and a 
discrete schedule (A , . . . , A"). The dynamics in path space is as follows: 

Algorithm 4.1 (See Ref. [15]) Starting from W° = and m = 0, 

• Replace X m by A m+1 ; 

• Update the work as W m+1 = W m + E xm +i (x m ) - £ A ™ {x m ); 

• Do a Monte Carlo path sampling move using a Metropolis-Hastings scheme with the measure 
tt x (using for example the usual shooting moves with a Langevin dynamics, or the Monte 
Carlo move designed for path switching presented in Appendix A ), so that the current path 
x m is transformed into the new path x rn+1 . 

This procedure is repeated for independent initial conditions x k '° , so that a sample of M end 
paths {x 1,n , . . . ,x M ' n ) with weights (e~ w ,e~ w ' ) is obtained. Besides, an estimation of 

the rate constant is given by the exponential average 



and it can be shown that Cm — > C when M — > +oo. 

Since the realizations of the switching procedure are independent provided the initial conditions 
are independent, the random variables {e~ w are i.i.d. A confidence interval can be obtained 
for Cm as 

/ M 

C~ M ^ < Cm < C+ >CTc , with C MMc = In - £ e"^' 

V fe=i 

where the empirical variance is 

V M -- 
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A confidence interval on the free energy difference is then 

-InC^ < AFa—>ab < -lnC+ CTc . 

For example, the 95 % confidence interval corresponds to <r c = 1.96. 

Of course, it may the case that the variance of the work distribution is large, so that only 
very few paths are relevant (and the confidence interval for the rate constant is large). Therefore, 
most computational effort is discarded in the end. A method enabling an on the fly sorting out of 
the irrelevant would be an interesting improvement of the method. Such a procedure could also 
concentrate the efforts on important transition tubes. The Interacting Particle Systems (IPS), 
already used in the context of nonequilibrium free energy differences [26], is such an approach. 



4.2 Enhancing the number of relevant paths 

We present here an extension of a birth/death process, introduced for equilibrating a simulated 
annealing process done at finite rate (and therefore out of equilibrium) , to the case of path sam- 
pling. This procedure can be seen as a time continuous resampling, and avoids the degeneracy of 
the paths weights (see also the related population Monte-Carlo algorithms [H]). The idea of IPS 
is to switch several paths in parallel, and to attach exponentially distributed birth and death times 
to each path. The death time of the path is decreased when the work exerted on it is higher than 
the average work; when this time is zero, a new exponentially distributed death time is generated, 
the path is suppressed, and replaced by another path picked up at random among the other paths. 
The birth time of the path is decreased when the work exerted on it is lower than the average 
work; when this time is zero, a new exponentially distributed birth time is generated, another path 
picked up at random is suppressed, and is replaced by the path giving birth. In all cases (birth or 
death), the works attached to a path are kept. We refer to [26] for a proof of the consistency of 
the method. 

Algorithm 4.2 Consider an initial distribution (x 1 ' , . . . , x M '°) generated from ttq. Generate 
independent times r ' , r k ' d from an exponential law of mean 1. Consider two additional variables 
Yr>° , £ fc,d per replica, initialized at 0. 

• Replace X m by X m+1 ; 

• Update the works as W k ' m+1 = W k ' m + A£ k ' m = W k > m £ xm+ i(x k ' m ) - £ Xm (x k ' m ), and 
compute the mean work update A£ = M^ 1 5Zi<fc<A/ 

• (Diffusion step) Do a Monte Carlo path sampling move using a Metropolis-Hastings scheme 
with the measure 7r Am +i, so that x k - ja is transformed into x k,m+1 . 

• (Birth/death process) Update the variables S fc,b and E fe ' as 

E fe,6 = E fe,6 + pf^g™ _ A£ fe ' m )-, 

and 

s fc,d = e m + p(£g m _ A£ fc > m )+. 

(Death) If T, k ' d > r k ' d , select an index m £ {1, . . . , M} at random, and replace the k-th path 
by the m-th path. Generate a new time T k,d from an exponential law of mean 1, and set 

(Birth) If T, k ' b > r k ' b , select an index m £ {1,...,M} at random, and replace the m-th 

path by the k-th path. Generate a new time r k ' h from an exponential law of mean 1, and set 
S fe,6 = 0; 
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M n 


Backward Forward IPS (forward) 


2000 2000 
2000 5000 
2000 10000 
2000 15000 


4.83 (4.61-5.02) 5.43 (5.28-5.61) 4.82 (4.78-5.85) 
5.34 (5.04-5.58) 5.41 (5.32-5.50) 5.19 (5.16-5.23) 
5.45 (5.32-5.58) 5.40 (5.34-5.46) 5.40 (5.36-5.43) 
5.42 (5.35-5.49) 5.40 (5.35-5.45) 5.45 (5.42-5.48) 



Table 2: Free energy differences AFa-*ab computed for different switching lengths n, using a 
sample of M = 2000 paths. The results are presented under the form "Cm {C m a — C M )" with 
a c = 1.96 (the value corresponding to a 95 % confidence interval). 



Then, each path has weight 1 in the end, and the final sample (x 1,n , . . . ,x M ' n ) is distributed 
according to 7Ti = ttab- In this case, an estimation of the rate constant is given by the simple 
average 



M 

C M (LAt) = -Y, Wk ' n 



fe=i 



and it can be shown that Cm — * C when M — > +oo. A confidence interval for the free energy 
difference can be obtained as in section Ej] as 



OK." < ^ AB < C^, with cgf = ^ £ ± a c f-^ , 

k=l 



M 't/Ips 



the empirical variance being 



M / m \ 2 

k=l \ 1=1 / 



4.3 Numerical results 

We compute here the free energy differences while constraining paths to for the WCA model 
system introduced in section 13.51 This is done either with plain nonequilibrium switching, or 
with the IPS equilibration. Let us notice that the energy is fixed in [15] while we rather have to 
fix the temperature in the stochastic setting, so that a straightforward comparison of the results 
is not possible. We set j3 = 1 in the sequel. The other parameters are the same as in |15| : 
N = 9 particles, h = 6, a = 1, e = 1, the particle density p = 0.6cr~ 2 , w = 0.25, and the sets 
A = {{(q) < U = l.3a},B = {£(q) > £ B = 1.45ct}. The trajectory length is L = 320 At and 
At = 0.0025, so that LAt = 0.8(toct 2 /e) 1/2 . 

We perform a total of n MC moves (using the brownian tube proposal function (with «i 
' i = 0.8 for all < i < L — 1). The function h\ is the one given in |15j : 

h x (q) = c - a ^( 1 - 1 b(9))(?b-«9)) 

with K = 100. The switching schedule is A* = {i/n) 2 . 

A typical free energy difference profile is presented in Figure [5] for M = 2000 and n = 10000, 
as well as the associated weights for the plain nonequilibrium switching. These weights are the 
Jarzynski weights renormalized by the total weight (in order to define a probability distribution) : 

e- w "' n , . 

Wk = E fi lG -^ - (16) 

Notice that the sample is very degenerate since very many paths have negligible weights, and the 
relevant paths are exponentially rare. Recall also that the paths all have weight 1 with the IPS 
algorithm. 
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Some free energy differences are presented in Table[2]for different values of n (keeping M fixed). 
The switchings are slow enough when the confidence intervals for free energy differences computed 
by constraining paths agree ('forward' switching) overlap with confidence intervals for free energy 
differences obtained by starting from a sample of constrained paths and removing progressively the 
constraint ('backward' switching). This is the case here for n = 5000, 10000, 15000 (but not when 
n = 2000). The results show that IPS agrees with the usual Jarzynski switching, the confidence 
interval on the results being however lower. 




Figure 5: Left: Free energy profile for a forward switching, computed for M = 2000 and n = 10 4 , 
using a plain nonequilibrium switching. Right: Histogram of the weights Wk of the final sample 
as given by (|X6|> - 

We also present in Figure [6] a final sample computed using a quite fast switching (n — 1000) 
with a small sample of paths (M = 100). Notice that all the 100 paths generated with the IPS 
switching are reactive, in contrast with the paths generated by a straightforward switching in the 
Jarzynski way. Besides, as a consequence of the degeneracy of paths, only 8 paths in 100 have 
a significant weight (larger than 0.05 when normalized by the total weight, as given by (QUI)). 
This simple example shows why it is difficult to compute averages over the final sample of paths 
when performing plain nonequilibrium switching, and why it may be interesting to resort to some 
selection process to prevent such a degeneracy. 

In agreement with a previous study [26], the results show that the IPS algorithm allows to 
reduce the variance on the estimates and to end up the simulation with a well-distributed and 
non-degenerate sample, provided the switching is slow enough. 

5 Conclusion and prospects 

In conclusion, we have presented here some new algorithms for path sampling with stochastic 
dynamics, either equilibrium sampling (wich can be used for the computation of free energy differ- 
ences, or reaction rates), or nonequilibrium sampling (which allows to perform simulated annealing 
in a rigorous manner instead of performing simulated tempering; or to switch from a sample of un- 
constrained paths to a sample of constrained paths, and compute the associated ratio of partition 
functions) . 

The brownian tube proposal used for equilibrium sampling is a simple generalization of the 
previous approaches, and can therefore always be used as a shooting algorithm with only minor 
modifications to existing TPS algorithms. A systematic criterion for setting the correlation factors 
{cxi}i would be to consider simple analytical forms as proposed at the end of section !3.3[ and choose 
{cxi}i to obtain balanced acceptance/rejection rates or, when some specific observable has to be 
computed, to optimize the parameters to obtain the best convergence results (on some preliminary 
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Figure 6: Comparison, for a nonequilibrium switching of paths for M = 100 systems in n = 1000 
steps without (Left) or with IPS (Right). Only the paths having a weight greater than 0.05 are 
plotted in solid lines when plain nonequilibrium switching is used (the other paths are plotted in 
dotted lines). 



computations). However, for simulations of large systems using long paths, the brownian tube 
approach may be impossible to use because of the limited numerical precision and the chaotic 
behavior of the system: indeed, starting from a given path, it is not clear whether this path can 
be recovered by first computing the random numbers associated with the trajectory, and then 
integrating this trajectory again starting from the initial point. 

The equilibration of the nonequilibrium switching dynamics is very infesting to reduce the 
variance of free energy computations when switching from unconstrained to constrained paths, or 
to obtain well-distributed ensemble of paths in the end (which is of paramount importance for 
the correctness of a simulated annealing procedure for example). However, the switching still has 
to be done slowly enough and using a number of replicas large enough. Once again, this may be 
problematic for very large systems. 

It would be interesting now to extend the switching procedure to TIS [34, 33J, where the length 
of paths is not constant, but which is naturally sequential in the way computations are done in 
practice: indeed, the flux through the next intermediate interface is computed using a sample of 
paths crossing the previous interface (this is the major difference with the forward flux techniques 
of [1] where only points on the previous interface are kept). 
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Appendix A: Specific Monte-Carlo moves for switching from 
unconstrained to constrained path ensembles 

When an interpolating function h\ appearing in (fT5"l) (or, equivalently, some order parameter £) 
is known, it is possible to increase the likeliness of the end point of the trajectory by performing 
a move on the last configuration in the direction opposite to Vh\(q) while keeping the random 
numbers used for the transitions. These moves should of course be employed with other MC 
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moves, especially MC moves relying on some trajectory generation, in order to relax the shift 
toward higher values of h\ or £. 

More precisely, using for example an overdamped Langevin dynamics to update the end config- 
uration, the associated Metropolis-Hastings Monte-Carlo elementary step is, starting from a path 
x for a parameter A (in the Langevin dynamics setting): 



Algorithm 5.1 Starting from a path x = (xq, . . . , xl), 

• Compute the sequence of 2d- dimensional noises (Ui)o<i<L-i associated with the backward 
(time-reversed) integration from xl to xq; 

• Compute a final configuration as i/l = xi + 5\V£,(q) + {25\/ (3) 1 / 2 G where G is a dN- 
dimensional random gaussian vector; 

• Integrate the path backward (time-reversed), starting from i/l, using the noises (C/i)o<i<L-i 
to obtain a path y = (yo, ■ ■ ■ , Ul)- The probabilty V{x,y) to obtain y starting from x is 
therefore the probability to obtain y^ from xl, so that 

( f3 \ d/2 ( (3 
V(x,y) = p swi tch{x L,yL) = y^—^J ex P y~4§2 \ yh ~ XL ~ ^V£(<?)| 2 



Tt(x)P(x 7 y) J \ ' l A (x )p(x ) p sw itch(a;L,2/L) 

The magnitude 5\ can be made to depend a priori on A. It is then adjusted in pratice on the 
fly by first computing the values of the gradient for the endpoint of each replica, in order to ensure 
that the displacement is small enough. 



Accept the new path y with proba 

/ \ . (, n(y)'P(y,x) \ . ( l A (y )p(y ) Pswitch(j/i,2;L) 
r{x, y) = mm 1, I — mm I 1, 
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